MIS-Boost: Multiple Instance Selection Boosting 



Emre Akbas Bernard Ghanem Narendra Ahuja 

Department of Electrical and Computer Engineering 
Computer Vision and Robotics Lab 
Beckman Institute for Advanced Science and Technology 
University of Illinois at Urbana-Champaign, IL USA 61801 
{e akbas , bghanem2 , ahu ja}@vision .ai.uiuc.edu 



Abstract 

In this paper, we present a new multiple instance learning (MIL) method, called 
MIS-Boost, which learns discriminative instance prototypes by explicit instance 
selection in a boosting framework. Unlike previous instance selection based MIL 
methods, we do not restrict the prototypes to a discrete set of training instances 
but allow them to take arbitrary values in the instance feature space. We also do 
not restrict the total number of prototypes and the number of selected-instances 
per bag; these quantities are completely data-driven. We show that MIS-Boost 
outperforms state-of-the-art MIL methods on a number of benchmark datasets. 
We also apply MIS-Boost to large-scale image classification, where we show that 
the automatically selected prototypes map to visually meaningful image regions. 

1 Intoduction 

Traditionally, supervised learning algorithms require labeled training data, where each training in- 
stance is given a specific class label. The performance and learning capability of such algorithms 
are impacted by the correctness of these instance labels especially when they are obtained through 
human interaction. In some applications (e.g. object detection, semantic segmentation, and activity 
recognition), ambiguities in human labeling may arise (e.g. when detection bounding boxes are not 
accurately sized or positioned). Traditional supervised learning methods cannot easily resolve these 
label ambiguities, which are inherently handled by multiple instance learning methods. These latter 
methods are based on a significantly weaker assumption about the underlying labels of the training 
instances. They do not assume the correctness of each individual instance label, yet they assume la- 
bel correctness at the level of groupings of training instances. Multiple instance learning (MIL) can 
be viewed as a weakly supervised learning problem where the labels of sets of instances (known as 
bags) are given, while the labels of the instances in each bag are unknown. In a typical binary MIL 
setting, a negative bag contains instances that are all labeled negative, while a positive bag contains 
at least one instance labeled positive. Since the instance labels in positive bags are unknown, a MIL 
classifier seeks an optimal labeling scheme for the training instances so that the resulting labels of 
the training bags are correct. 

Due to their ability to handle incomplete knowledge about instance labels of training data, MIL 
methods have gained significant attention in the machine learning and computer vision communities. 
In fact, the MIL learning framework manifests itself in numerous applications that span from text 
categorization 1 1| and drug activity recognition |2| to many vision applications, where an image 
(or video) is represented as a bag of instances (interest points, patches, or image segments), only a 
subset of which are meaningful for the task in question (e.g. image classification). Recently, it has 
been shown that MIL methods can achieve state-of-the-art performance in a multiplicity of vision 
applications including content-based image retrieval | 3j|4||5l, image classification (61171111, activity 
recognition |9|, object tracking [10, 11|, object detection lUlKTSl, and image segmentation |14|. 
In fact, label ambiguity and incompleteness lead to significant challenges in the MIL framework. 
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especially when positive bags are dominated by negative instances (e.g. an image of an airplane 
dominated by patches of sky). It is not surprising to see empirical evidence that the use of traditional 
supervised learning methods in MIL problems often leads to reduced accuracy 1 15 |. Consequently, 
much effort has been done to develop effective learning methods that exploit the structure of MIL 
problems. We will give a brief overview of the most recent and popular MIL methods next. 

Related Work 

During the past two decades, many MIL methods have been proposed with a significant interest 
in MIL emerging in recent years especially within the machine learning and vision communities. 
The work in 1 16 1 is one of the earliest papers that address the MIL problem, whereby it was cast 
in the framework of recognizing hand- written numerals. Over the next twenty years, MIL literature 
has abound with algorithms that differ in two main respects: (1) the level at which the labeling is 
determined: instance-level (bottom-up) or bag-level (top-down) and (2) the type of data modeling 
assumed, i.e. generative vs. discriminative. 

(1) . Bottom-up vs. Top-down: While most MIL approaches address the problem of predicting 
the class of a bag directly without inferring the labels of the instances that belong to this bag, some 
approaches use max margin techniques to do this inference |iI][T71[T8l. The latter approaches exploit 
the fact that the instances of negative bags have negative labels (—1) and at least one instance in 
each positive bag has a positive label (+1). For example, a MIL version of SVM is proposed in |T|, 
where the traditional SVM optimization problem is transformed into a mixed-integer program and 
subsequently solved by alternating between solving a traditional SVM problem and heuristically 
choosing the positive instances for each positive bag. 

(2) . Generative vs. Discriminative: Some MIL approaches are generative in nature, since they 
assume that the underlying instances conform to a certain structure. An early generative approach is 
based on finding an optimal hyper-rectangle discriminant in the instance space 121. Other prominent 
generative MIL approaches are based on the notion of diverse density (DD) |fT9| . These approaches 
seek a "concept" instanc^that is close to at least one instance of each positive bag mid far away 
from all the instances in the negative bags. In other words, a concept instance is a vector in in- 
stance space that best describes the positive bags and discriminates them from the negative bags. 
The existence of such a concept assumes that positive instances are compactly clustered and well 
separated from negative instances. Such an assumption is strict and does not always hold in natural 
data, which tends to be multi-modal. DD-based MIL approaches compute this optimal concept by 
formulating the problem in a maximum likelihood framework using a noisy-OR model of the like- 
lihood. Improvements on the original DD formulation have been made, where the EM algorithm is 
used to find the concept instance in | 3 1 and multiple concepts are estimated in |20|. Moreover, stan- 
dard supervised learning techniques, such as kNN, linear and kernel SVM, AdaBoost, and Random 
Forests, have been adapted to the MIL problem, thus, leading to citation kNN | 21 1, Ml-kemel 1221 . 
MIGraph/miGraph |6|, mi/MI/DD-SVM [1, 23|, MLBoost 124J, Ml-Winnow [4], MI logistic re- 
gression 1 25 1, and most recently MIForests 1 11|. Furthermore, some MIL approaches actively seek 
instances in the training set (denoted prototypes) that carry discriminative power between the posi- 
tive and negative classes. In what follows, we will denote these as instance selection MIL methods. 
Such approaches transform the original feature space into another space defined by the selected pro- 
totypes (e.g. using bag-to-instance similarities) and subsequently apply standard supervised learning 
techniques in the new space. In |7|, all training instances are selected to be prototypes, while only 
one instance per bag is systematically initialized, greedily updated, and selected in 1261161. 

Our proposed instance selection method (dubbed Multiple Instance Selection Boost or MIS-Boost) 
is inspired by the instance selection MIL methods mentioned above (e.g. MILES 1 7 , 8 1 and MILIS 
161 ) and the MI-Boost algorithm in 1241 . It was hinted earlier that instance selection MIL methods 
comprise two fundamental stages, (i) In the representation stage, the original training bags are rep- 
resented in a new feature space determined by the selected prototypes, (ii) In the classification stage, 
a supervised learning technique is used to build a classifier in the new feature space to optimize a 
given classification cost. Most of these methods treat the two stages independently and sequently, 
in such a way that representation (i.e. prototype selection) is unaffected by class label distribution. 
Here, MILIS is an exception, since it iteratively selects prototypes /r(9m the training set to minimize 

^This instance does not have to be one of the instances in the training set. 
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classification cost. This selection is further restricted, since only one prototype is selected per train- 
ing bag. We consider this to be a restriction because we believe that the prototype selection process 
should be data dependent. For example, in the case of image classification, some "simple" object 
classes (e.g. airplane) may yield a smaller number of prototypes than other more "complex" classes 
(e.g. bicycle). 

As compared to previous methods, prototypes selected by MIS -Boost do not necessarily belong to 
the given training set and the number of these prototypes is not predefined, since they are determined 
in a data-driven fashion (boosting). Since the search space for prototype instances is no longer 
limited, more discriminative and possibly fewer prototypes can be learned. This learning process 
directly involves minimization of the final classification cost. As such, MIS-Boost learns a new 
representation based on the estimated prototypes, in a boosting framework. This leads to an iterative 
algorithm, which learns prototype-based base classifiers that are linearly combined. At each iteration 
of MIS-Boost, a prototype is learned so that it maximally discriminates between the positive and 
negative bags, which are themselves weighted according to how well they were discriminated in 
earlier iterations. The number of prototypes is determined in a data-driven way by cross-validation. 
Experiments on benchmark datasets show that MIS-Boost achieves state-of-the-art performance. 
When applied to image classification, MIS-Boost selects prototypes that map to meaningful image 
segments (e.g. class specific object parts). 

In Section [2j we give a detailed description of the MIS-Boost algorithm including our proposed 
instance selection/learning method. We show that MIS-Boost achieves or improves upon state-of- 
the-art results on benchmark MIL datasets and popular image classification datasets in Section [3] 

2 Proposed Algorithm 

Given a training set T = {{Bi.yi), {B2, ^2), • • • , {Bn, Vn)} where yi G {-1, +1} Vz, Bi repre- 
sents the bag and its label, our goal is to learn a bag-classifier F : B { — 1,+1}. Each bag 
consists of an arbitrary number of instances. The number of instances in the bag is denoted by 
rii, so we have Bi = {x^i, x^2, • • • , ^iui}, where each instance x^j G Vz, j. We propose the 
following additive model as our bag classification function: 

F(B)=sign|^^/™(i?)j , (1) 

where each fm{B), called a base classifier, is associated with a prototype instance G M^. The 
function : 5 ^ [— 1, 1] is a bag classifier, like F, and it returns a score between —1 and 1, which 
quantifies the "existence" of the prototype instance within bag B. The "existence" of p^ within 
bag Bi is determined by the distance from p^ to the closest region to p^ within Bi, that is: 



D{pm,B) min d{prn,^ij), (2) 
J 

where (i(-, •) is a distance function between two instances, which we take to be Euclidean. We denote 
!)(•,•) as the instance-to-bag distance function. Here, we note that this instance-to-bag distance is 
used in other instance selection MIL methods (e.g. MILES and MILIS); however, the prototype 
Pm in these methods is restricted to a discrete subset of the training samples. By removing this 
restriction on Pm and allowing it to take arbitrary values in R^, more discriminative and possibly 
fewer prototypes need to be learned. The function fm{') computes the instance-to-bag distances 
first, and classifies the bags using these distances. Although fm{') can take any suitable form, we 
opt to use the simple scaled and shifted sigmoid function, parameterized by /3o, and p^. 

2 

•^^•^^^ " l + e-(/5i^(p-'^)+/5o) ~ 

Since F is an additive model, we use additive boosting to learn its base classifiers |27|. We call our 
algorithm multiple instance selection boosting, or MIS-Boost for short. One of the main reasons for 
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choosing boosting for classification is its abifity to select a suitable number of prototypes, in a data- 
driven fashion. By performing cross-validation, not only does the classifier avoid overfitting on the 
training set, but it also automatically determines the number of base classifiers (i.e. the number of 
prototypes) needed to form F. Note that other instance selection methods predefine or fix the number 
of prototypes that are used. Among the many variants of boosting, we choose Gentle-AdaBoost for 
its numerical stability properties |27|. 

2.1 Learning base classifier fm 

At each iteration of Gentle-AdaBoost, a weighted least-squares problem must be solved (step 2(a), 
Algorithm 4 in lf27l ). In our formulation, the following error should be minimized: 



argmine^ where e^ = '^w, [y^ - ^ _(^,o(p^,b^)+^„) + 1 1 (4) 

Here, Wi is the weight of the bag at the current iteration. The main difficulty in optimizing the 
cost function above is the fact that the instance-to-bag distance term I^(pm, B) involves the non- 
differentiable "min" function. It is this same function that forces other instance selection methods 
(e.g. MILES and MILIS) to restrict the prototype search space to a subset of the training samples. 
For example, MILES considers all training samples as valid prototypes, thus, making learning the 
classifier (£i SVM) significantly computationally expensive. On the other hand, MILIS takes a 
brute-force approach to prototype selection by greedily choosing one instance from each training 
bag as a valid prototype. Although selection is done so that an overall classification cost is iteratively 
reduced, this selection strategy highly restricts the feasible prototype space. To alleviate the problem 
of non-differentiability in our formulation, we replace "min" in D{pmjBi) with a differentiable 
approximation (known as "soft-min") to form the soft-instance-to-bag distance D{pmjBi). By 
setting a to a large positive constant, we have: 

rii 

D{pm,Bi) ^ D{pm,Bi) = ^7rjd{pm,^ij), where ttj 



Replacing D{pm, B) with D{pm^ B) in Eq. (|4]) renders the cost function differentiable, allowing for 
gradient descent optimization. However, it is not a convex cost function, so there is a risk of settling 
into undesirable local minima. To alleviate this problem, we allow for multiple initializations of 
Pm- Preferably, these initialization points should be sampled from the entire instance feature space. 
For this purpose, we cluster all the training instances using /c-means, and use the cluster centers as 
initialization points for p^. We minimize the cost in Eq. ^ using coordinate-descent. We start 
by initializing Pm to a cluster center and optimize over the (/3o, parameters. Then, we fix these 
parameters and optimize over p^. We iterate this procedure until convergence; that is, the difference 
between successive errors becomes smaller than a given threshold. The overall algorithm used to 
learn a base classifier is summarized in Algorithm [T] 

2.2 Determining the number of base classifiers 

As the number of base classifiers increases, Gentle-AdaBoost tends to overfit on the training data. 
In order to prevent this and determine the number of base classifiers automatically, we perform 4- 
fold cross validation, whereby we randomly split the training data into 4 equal-size pieces and use 
3 pieces for training and the rest for validation. We run the algorithm for a large number of base 
classifiers and pick the number which gives the least classification error on the validation set. We 
give the pseudo-code of MIS-Boost in Algorithm|2] 

3 Experiments 

In this section, we evaluate the performance of MIS-Boost on five different MIL benchmark datasets 
and two COREL image classification datasets. We compare our performance to those of the most 



-Q;(i(Pm,Xij) 



(5) 



/c=l 
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Algorithm 1 Learning fm (Pseudo-code for learning a base classifier) 

Input: Training set {(^i, (^2, ^2), • • • , (^at, Vn)}, Weights Wi,i = 1, 2, . . . , A/", cluster cen- 
ters {ci, C2, . . . , cx}, Error tolerance Tol. 

Output: Base classifier /^(x) 

// Initialize to each cluster center 
for = ci,C2, . . . ,ck do 
error(-l) ^ 00 

P^) ^ arg min £m \ (p^=p^) {Fix Pm and minimize over /3's} 
error(0)^£^(p^,/3§,/3?) 
t ^ 

while |error(t + 1) — error(t)| > Tol do 

p^ ^ argmin^^ £m\^l3^=f3t^-\l3^=l3l-^^ {Fix (/3o,/3i) and minimize over p^ } 

error(t) ^ £^(p^, 
end while 

Keep (p;^, (3q,(31) with the least error so far 
end for 

Set (p^, /3o, ^ (p;;,, and output fm 



Algorithm 2 MIS-Boost (Pseudo-code for the MIS-Boost algorithm) 

Input: Training set {(^i, 7/2), • • • , (^at, Vn)}, maximum number of base classifiers M, 

number of clusters K 
Output: Classifier F(x) 

Cluster all instances, x^j, z = 1, 2, . . . , A^; j = n^, into K clusters. 

Cluster centers are {ci, C2, . . . , c^}. 

Split the training set into train-set and validation-set. 

Weights Wi ^ 1/N for i = 1, 2, . . . , AT, and F{x) 0. 

form = 1,2,..., M do 

Learn a base classifier fm using the algorithm given in Algorithm [T] 
Update F{x) ^ F{x) + fm{x), 

Update Wi ^ Wie~y'^'^^^'^ and normalize weights so that J] = 1- 
Evaluate F{x) on the validation- set, compute validation-error(m). 
end for 

M ^ arg min^ (validation-error) 
Output F{x) = sign (^Ez^i /m(^)) 



recent and state-of-the-art MIL methods available for each dataset. In another experiment, we use 
MIS-Boost in a large-scale image classification task and visualize samples of the instances that are 
closest to the learned prototype(s). The results of this experiment suggest that the learned prototypes 
are not only discriminative but also visually meaningful, that is they are similar to the parts of image 
that are relevant for classification. 

3.1 Benchmark MIL datasets 

The drug activity prediction datasets, "Muskl" and "Musk2" described in f2l, and the image 
datasets, "Elephant", "Fox", "Tiger" introduced in 1 1 1 have been widely used and have become 
standard benchmark datasets for MIL methods. For each dataset, we perform 10-fold cross valida- 
tion and report the average per-fold test classification accuracy. This is the standard way of reporting 
results on these datasets. In all our experiments in this section and the two following sections, we 
set the number of clusters K = 100, and the maximum number of base learners, or prototypes, to 
M = 100. 

We report our results in Table [T] where we list the results of the most recent and state-of-the-art 
MIL methods. To the best of our knowledge, this table gives the most comprehensive comparison 
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Table 1: Percent classification accuracies of MIL algorithms on benchmark MIL datasets. Best 
results are marked in bold fonts. 



Method 


Muskl 


Musk2 


Elephant 


Fox 


Tiger 


MIS-Boost 


90.3 


94.4 


89.0 


80.0 


85.5 


MIForest|ll| 


85 


82 


84 


64 


82 


MIGraph|28| 


90.0 


90.0 


85.1 


61.2 


81.9 


miGraph|28| 


88.9 


90.3 


86.8 


61.6 


86.0 


MILBoost|24| 


71 


61 


73 


58 


58 


EM-DDI31 


84.8 


84.9 


78.3 


56.1 


72.1 


DDf23\ 


88.0 


84.0 


N/A 


N/A 


N/A 


MLSVM|1| 


77.9 


84.3 


81.4 


59.4 


84.0 


mi-SVMUJ 


87.4 


83.6 


82.0 


58.2 


78.9 


MILES Q 


88 


83 


81 


62 


80 


MILIS|29l 


88 


83 


81 


62 


80 


MI-Kernelf22l 


88 


89 


84 


60 


84 


AW-SVM|30| 


86 


84 


82 


64 


83 


AL-SVM|30| 


86 


83 


79 


63 


78 


MissSVM|18| 


87.6 


80.0 


N/A 


N/A 


N/A 



Table 2: Percent classification accuracies on the COREL- 1000 and CQREL-2000 datasets. 



Method 


COREL- 1000 


COREL-2000 


MIS-Boost 


84.2 


70.8 


MILIS|29| 


83.8 


70.1 


MILES |7| 


82.3 


68.7 


MIForestllH 


82 


69 



between MIL methods on the benchmark datasets. Clearly, MIS-Boost outperforms other methods 
in all datasets except the "Tiger" class. There, we have the second best accuracy with only a 0.5% 
difference with the top performing method, miGraph |28|. Among all the methods in Table [T] MIS- 
Boost is the most similar to MILES and MILIS, as they are also instance- selection-based methods. 
Except on "Muskl", our algorithm significantly outperforms these two methods. We believe that 
this improvement is largely due to the fact that our method, in contrast to MILES and MILIS, does 



not restrict the prototypes to a subset of the training instances, as we discussed in Section 2.1 



3.2 COREL dataset 



The COREL-2000 image classification dataset fT| contains 2000 images in 20 classes. COREL- 
1000 is just a subset of this dataset, which contains the first 10 classes. We use the same features and 
experimental settings as in |7 |, and train one-vs-all MIS-Boost classifiers to deal with the multiclass 
case. The results of MIS-Boost and three most recent, state-of-the-art methods are given in Table|2] 
Our method outperforms the other methods on both datasets. To illustrate the data-driven nature of 
our algorithm, we give the number of prototypes learned per class in Figure [T] 



3.3 PASCAL VOC 2007 



The image categorization task of PASCAL VOC is inherently a MIL problem since the label of an 
image indicates the existence of at least one object of that label class within the image. However, 
to the best of our knowledge, no MIL results have been reported on this dataset. This is probably 
because of the large number of images (10^) it contains. If we assume that each image has a few 
hundred instances, then the total number of instances is in the order of millions. Instance selection 
based methods like MILES would easily run into memory problems. In this section, we evaluate the 
performance of MIS-Boost on this large-scale image classification dataset, and visualize the selected 
instances, i.e. those instances that are closest to the learned prototypes to see if they overlap with 
the object(s) of interest. To this end, we run MIS-Boost on three selected classes from the dataset, 
"aeroplane", "bicycle", and "tvmonitor". 



6 



COREL-2000 
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2 4 6 8 10 12 14 16 18 20 

classes 



Figure 1: Number of base learners, or prototypes, per class as determined by MIS-Boost on the 
COREL-2000 dataset. 



To decide what type of instances (or features) to use, we did preliminary experiments with SIFT 
keypoint/descriptors | 31 1 and regions obtained by the segmentation algorithm 1 32 J . Regions gave 
better results (0.55 average-precision (AP)^than the SIFT descriptors (0.38 AP) on the "aeroplane" 
class, so we decided to use regions. 

This dataset is not only huge but also unbalanced. The "aeroplane" class has 442 positive vs. 9518 
negative; the "bicycle" class has 482 positive vs. 9458 negative; and "tvmonitor" has 485 positive 
vs. 9429 negative images (bags). This unbalancedness makes finding discriminative instances in the 
positive bags, among the clutter from the relatively large number of negative bags, quite challenging. 

MIS-Boost yields an average-precision (AP) score of 0.55 for the "aeroplane" class. This score is 
significantly below the state-of-the-art (e.g. 0.76 in |[33ll ) on that dataset. We believe that this dis- 
crepancy is largely due to the fact that MIL methods do not model the context (or the background), 
while 1 33 1 and other similar approaches do so. Although the MIL approach seems to be the most ap- 
propriate one for the PASCAL dataset (given how the ground- truth is formed), these results suggest 
that the context/background information is highly discriminative. Another reason for the discrep- 
ancy might be the instance features we use, namely the segmentation might fail to capture the object 
or its parts. MIS-Boost yields an AP of 0.28 on "bicycle" (compared to 0.65 in I33j|) and 0.36 on 
"tvmonitor" (compared to 0.52 in 1331 ). 

Next, we visualize the selected instances in each image that are closest to the learned prototypes. 
Figure |2] gives examples of true positives from the aeroplane class. On each image, we show the top 
three instances, i.e. regions, that are closest to the top 3 prototypes learned by MIS-Boost. These 
regions make the highest contribution to the correct classification of their images. Similarly, Figure 
[3] and Figure |4] present the same for the "bicycle" and "tvmonitor" classes, respectively. As one 
can observe from the images, the most discriminative regions usually overlap with the object of 
interest. Occasionally, some wrong instances are selected as shown in the lower-middle, and lower- 
right images of Figure |4] Apparently, the "tvmonitor" classifier learned square-shaped or frame- 
shaped prototypes, and the window in the lower-middle image, and the door in the lower-right image 
are good selections for this prototype. We will make these challenging PASCAL datasets publicly 
available (online), in a MIL format, along with the code needed to visualize selected instances. We 
hope that these larger and more challenging datasets are used to compare MIL methods in the future. 

4 Conclusion 

We presented a new multiple instance learning (MIL) method that learns discriminative instance 
prototypes by explicit instance selection in a boosting framework. We argued that the following 
three design choices and/or assumptions restrict the capacity of a MIL method: (i) treating the pro- 
totype learning/choosing step and learning the final bag classifier independently, (ii) restricting the 

^This is the standard performance measure used in PASCAL VOC. 
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Figure 2: Example true positives from the "aeroplane" class (i.e. these images contain at least 
one instance of aeroplane). On each image, the three regions that are most similar to the top three 
prototypes learned by MIS -Boost are shown with yellow boundaries (Best viewed in color). 




Figure 3: Example true positives from the "bicycle" class. See caption of Figure[2]for an explanation 
of the yellow boundaries (Best viewed in color). 
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Figure 4: Example true positives from the "tvmonitor" class. See caption of Figure|2]for an expla- 
nation of the yellow boundaries (Best viewed in color). 

prototypes to a discrete set of instances from the training set, and (iii) restricting the number of 
selected-instances per bag. Our method, MIS-Boost, overcomes all three restrictions by learning 
prototype-based base classifiers that are linearly boosted. At each iteration of MIS-Boost, a proto- 
type is learned so that it maximally discriminates between the positive and negative bags, which are 
themselves weighted according to how well they were discriminated in earlier iterations. The num- 
ber of total prototypes and the number of selected-instances per bag are determined in a completely 
data-driven way. We showed that our method outperforms state-of-the-art MIL methods on a num- 
ber of benchmark datasets. We also applied MIS-Boost to large-scale image classification, where 
we showed that the automatically selected prototypes map to visually meaningful image regions. 
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